9 Hydro Event Delineation

Purpose: Conduct baseflow separation and delineate hydrologic events to model in Gg framework

9.1 Data

9.1.1 Site info and daily data

Code

# site information and locations
siteinfo <- read_csv("C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/Data/EcoDrought_SiteInformation.csv")
siteinfo_sp <- st_as_sf(siteinfo, coords = c("long", "lat"), crs = 4326)
mapview(siteinfo_sp, zcol = "designation")

Code

# flow/yield (and temp) data 
dat <- read_csv("C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/Data/EcoDrought_FlowTempData_DailyWeekly.csv") %>%
  filter(!site_name %in% c("WoundedBuckCreek", "Brackett Creek"))

# add water/climate year variables
dat <- add_date_variables(dat, dates = date, water_year_start = 4)
str(dat)

tibble [187,634 × 36] (S3: tbl_df/tbl/data.frame)
 $ station_no          : chr [1:187634] "12355347" "12355347" "12355347" "12355347" ...
 $ site_name           : chr [1:187634] "Big Creek NWIS" "Big Creek NWIS" "Big Creek NWIS" "Big Creek NWIS" ...
 $ site_id             : chr [1:187634] "BIG" "BIG" "BIG" "BIG" ...
 $ basin               : chr [1:187634] "Flathead" "Flathead" "Flathead" "Flathead" ...
 $ subbasin            : chr [1:187634] "Big Creek" "Big Creek" "Big Creek" "Big Creek" ...
 $ region              : chr [1:187634] "Flat" "Flat" "Flat" "Flat" ...
 $ lat                 : num [1:187634] 48.6 48.6 48.6 48.6 48.6 ...
 $ long                : num [1:187634] -114 -114 -114 -114 -114 ...
 $ elev_ft             : num [1:187634] 3528 3528 3528 3528 3528 ...
 $ area_sqmi           : num [1:187634] 73.6 73.6 73.6 73.6 73.6 ...
 $ designation         : chr [1:187634] "medium" "medium" "medium" "medium" ...
 $ date                : Date[1:187634], format: "2018-09-10" "2018-09-11" ...
 $ DischargeReliability: num [1:187634] 1 1 1 1 1 1 1 1 1 1 ...
 $ TempReliability     : num [1:187634] 1 1 1 1 1 1 1 1 1 1 ...
 $ flow_mean           : num [1:187634] 30.3 29.1 29 31.2 32.3 ...
 $ flow_min            : num [1:187634] 28.4 28.4 28.2 29.6 30.3 29.1 29 29.5 28.9 28.3 ...
 $ flow_max            : num [1:187634] 31.5 29.7 30.5 33 34.1 31.4 31.3 32.2 30.1 29.5 ...
 $ tempc_mean          : num [1:187634] NA NA NA NA NA NA NA NA NA NA ...
 $ tempc_min           : num [1:187634] NA NA NA NA NA NA NA NA NA NA ...
 $ tempc_max           : num [1:187634] NA NA NA NA NA NA NA NA NA NA ...
 $ flow_mean_filled    : num [1:187634] 30.3 29.1 29 31.2 32.3 ...
 $ flow_mean_cms       : num [1:187634] 0.857 0.825 0.821 0.885 0.915 ...
 $ flow_mean_filled_cms: num [1:187634] 0.857 0.825 0.821 0.885 0.915 ...
 $ area_sqkm           : num [1:187634] 191 191 191 191 191 ...
 $ Yield_mm            : num [1:187634] 0.389 0.374 0.372 0.401 0.415 ...
 $ Yield_filled_mm     : num [1:187634] 0.389 0.374 0.372 0.401 0.415 ...
 $ flow_mean_7         : num [1:187634] NA NA NA 30.3 30.4 ...
 $ flow_mean_filled_7  : num [1:187634] NA NA NA 30.3 30.4 ...
 $ tempc_mean_7        : num [1:187634] NA NA NA NA NA NA NA NA NA NA ...
 $ Yield_mm_7          : num [1:187634] NA NA NA 0.389 0.39 ...
 $ Yield_filled_mm_7   : num [1:187634] NA NA NA 0.389 0.39 ...
 $ CalendarYear        : num [1:187634] 2018 2018 2018 2018 2018 ...
 $ Month               : num [1:187634] 9 9 9 9 9 9 9 9 9 9 ...
 $ MonthName           : Factor w/ 12 levels "Apr","May","Jun",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ WaterYear           : num [1:187634] 2019 2019 2019 2019 2019 ...
 $ DayofYear           : num [1:187634] 163 164 165 166 167 168 169 170 171 172 ...

9.2 West Brook

9.2.1 Trim to focal sites

Code

dat_wb <- dat %>% filter(site_name %in% c("West Brook NWIS", "West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook"))

View daily yield at Big G

Code

dat_wb %>% filter(site_name == "West Brook NWIS") %>% select(date, Yield_filled_mm) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Daily yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Total daily streamflow in yield (mm), over the period of record

View daily yield at all sites

Code

dat_wb %>% filter(site_name != "West Brook NWIS") %>% select(date, site_name, Yield_filled_mm) %>% spread(key = site_name, value = Yield_filled_mm) %>% left_join(dat_wb %>% filter(site_name == "West Brook NWIS") %>% select(date, Yield_filled_mm) %>% rename(West_Brook_NWIS = Yield_filled_mm)) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Yield (mm)") %>% dyOptions(colors = c(hcl.colors(9, "Zissou 1"), "black"))

9.2.2 Set parameters

Set baseflow separation (for the Lyne-Hollick one-parameter digital recursive filter) and event delineation paramters (as in Wasko and Guo, 2022)

alp: alpha filter parameter, higher values “lower” the estimated baseflow (thus making it difficult to delineate events)
numpass: number of passes. Ladson et al. (2013) recommend 3 passes for daily data (default in baseflowB() function)
thresh: baseflow index threshold for event delineation, higher threshold values make it “easier” to delineate events

Code

alp <- 0.925
numpass <- 3
thresh <- 0.75

9.2.3 Baseflow separation

Perform baseflow separation. See Wasko and Guo (2022, Hydrological Processes) for recommendations on parameterization, as different algorithms and alpha values can produce different results. Section 4.1: “…users should not simply use recommended filter parameter values from literature in combination with any baseflow filter code without verification of their choice of filter parameter. As the digital baseflow filter is not tied to any physical realism (Nathan and McMahon, 1990) and a larger fractional baseflow may aid identification of events—even if this is not strictly baseflow as per its definition (Linsley et al., 1958)…”.

It is not actually necessary to do this separately, as baseflow separation is conducted in the event delineation function internally. But it is helpful to view the results and explore how different parameterizations yield different baseflow contributions.

Code

dat_wb_bf <- dat_wb %>% 
  filter(!is.na(Yield_filled_mm)) %>% 
  select(site_name, basin, subbasin, WaterYear, date, Yield_filled_mm, flow_mean_filled_cms, area_sqkm) %>%
  group_by(site_name) %>%
  mutate(bf = baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bf, 
         bfi = baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bfi) %>%
  ungroup()
head(dat_wb_bf)

# A tibble: 6 × 10
  site_name   basin      subbasin   WaterYear date       Yield_filled_mm
  <chr>       <chr>      <chr>          <dbl> <date>               <dbl>
1 Avery Brook West Brook West Brook      2020 2020-02-20            1.54
2 Avery Brook West Brook West Brook      2020 2020-02-21            1.39
3 Avery Brook West Brook West Brook      2020 2020-02-22            1.21
4 Avery Brook West Brook West Brook      2020 2020-02-23            1.25
5 Avery Brook West Brook West Brook      2020 2020-02-24            1.33
6 Avery Brook West Brook West Brook      2020 2020-02-25            1.66
# ℹ 4 more variables: flow_mean_filled_cms <dbl>, area_sqkm <dbl>, bf <dbl>,
#   bfi <dbl>

Code

dat_wb_bf %>% filter(site_name == "West Brook NWIS") %>% select(date, Yield_filled_mm, bf, bfi) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Total daily streamflow in yield (mm), baseflow in yield, and fractional baseflow index over the period of record for site West Brook NWIS

9.2.4 Event identification

There are multiple options for methods of event identification. Section 4.2 in Wasko and Guo (2022): It can be generalized that, if the aim of the analysis is to identify independent streamflow maxima then eventMaxima() and eventMinima() work well and indeed have been traditionally employed for this task. If identifying independent small events becomes difficult, or the aim is to identify wet spells, eventBaseflow() may be preferred.” In our case, small events are likely important, so we use eventBaseflow() with a ~large values for BFI_Th to capture those small events.

The aim is to understand how water availability affects differences in yield between Big G and Little g during hydrologic events and the intervening periods (non-event baseflow periods). Therefore, we identify events from the Big G data, apply event/non-event time periods to the little g data, then explore G-g deviations during both events and non-events as a function of

9.2.4.1 Identify events

Separate Big G and Little G data

Code

dat_big <- dat_wb_bf %>% filter(site_name == "West Brook NWIS")
dat_little <- dat_wb_bf %>% filter(site_name != "West Brook NWIS")

Identify events at Big G

Code

events <- eventBaseflow(dat_big$Yield_filled_mm, BFI_Th = thresh, bfi = dat_big$bfi)
events <- events %>% mutate(len = end - srt + 1)
(events)

     srt  end which.max         max          sum len
1      7   10         9  2.38497555   7.64869449   4
2     14   16        15  2.07111297   5.56984542   3
3     27   32        28  5.55987600  18.93125083   6
4     33   36        34  3.05191193   9.89582615   4
5     42   45        44  3.09587342  10.04006129   4
6     48   51        50  2.99879539  10.11468354   4
7     56   66        60  6.55907141  41.26316897  11
8     70   72        71  3.35584214   8.36563814   3
9     73   77        75  7.39796717  20.73565834   5
10    91  123        93 13.49332693  81.02096355  33
11   131  136       134  0.42653008   1.80433842   6
12   151  154       152  0.35021501   1.21473626   4
13   155  157       156  0.57631007   1.11656136   3
14   161  165       162  0.90517483   2.21075332   5
15   168  172       169  0.32221439   1.17910068   5
16   174  184       175  1.16607633   3.04118498  11
17   187  193       188  0.24304053   0.78984717   7
18   210  214       213  0.13748977   0.48903486   5
19   216  220       217  0.29573386   0.72424178   5
20   224  227       225  0.26640317   0.51998064   4
21   243  249       244  0.78269588   1.73834437   7
22   256  266       261  1.00459343   3.35632581  11
23   272  281       274  0.85217057   3.84481654  10
24   290  294       291  0.82467953   2.12539548   5
25   297  315       306 13.65535395  35.89563528  19
26   321  327       326  3.66061955  16.06901178   7
27   329  344       330 24.87600146  86.15566545  16
28   351  356       353  3.36966108  13.08497738   6
29   382  385       384  1.70473844   4.77951207   4
30   406  412       408  2.03095169  10.77360433   7
31   413  416       414  2.53707021   6.93624199   4
32   419  428       424  4.85485118  23.07934982  10
33   441  450       442  6.24762715  32.16321850  10
34   454  476       456 13.97033931  92.61839297  23
35   484  492       487  3.22862157  13.31926016   9
36   501  503       502  0.88565558   2.32355902   3
37   517  523       520  2.04546157   7.11885922   7
38   524  544       536 17.64531250  83.43984731  21
39   544  551       547  4.28110625  18.44361514   8
40   566  569       567  1.76301980   4.41072787   4
41   570  576       572 14.16492720  29.84383108   7
42   580  587       581 14.64023626  29.47140889   8
43   587  590       588  1.79127953   6.07791335   4
44   602  604       603  1.62312899   3.48961808   3
45   612  618       614  7.56655819  19.48346413   7
46   625  627       626  1.67010474   4.24344965   3
47   633  638       636  3.09932815  11.64858540   6
48   639  644       640  5.01186884  15.45026095   6
49   651  658       653  5.38610289  24.25542761   8
50   681  683       682  1.75077277   4.09601888   3
51   688  691       689  2.62335219   6.82594961   4
52   691  694       693  1.87384767   6.39143045   4
53   695  698       696  2.22398495   7.23654473   4
54   702  705       703  1.92065931   6.67955850   4
55   730  733       732  1.60304575   3.99293205   4
56   735  740       736 11.81155383  22.61873988   6
57   745  749       748  2.11013492   8.52400389   5
58   749  754       750  3.69906996  14.03872475   6
59   754  762       755  6.20288835  25.73226627   9
60   766  769       768  4.03003346  10.26263252   4
61   771  773       772  2.68795571   7.58745777   3
62   775  777       776  2.77985163   7.36678820   3
63   779  782       780  4.20026548  12.18916488   4
64   783  788       785  4.49296781  18.97745789   6
65   791  794       792  3.99799081  11.30743042   4
66   798  804       799  7.87687966  26.61191978   7
67   809  817       810  5.81630363  25.33149085   9
68   817  833       818  2.25326382  26.69256100  17
69   837  845       838  1.14862993   6.49759442   9
70   848  852       850  0.67543502   2.54228686   5
71   852  854       853  0.45228513   1.29758942   3
72   859  868       861  1.12555230   4.48991900  10
73   868  873       870  0.32774197   1.51574723   6
74   873  892       875  0.25402658   3.20589805  20
75   895  909       901  0.16721776   1.13500100  15
76   910  912       911  0.03383048   0.09089405   3
77   921  923       922  0.04333964   0.08305180   3
78   935  938       937  0.03871029   0.10731267   4
79   949  962       950  0.56033193   1.92024474  14
80   963  970       966  0.32348401   1.42571686   8
81   970  972       971  0.12139935   0.31966653   3
82   978  982       979  0.30254832   0.88719294   5
83   987  996       988  1.06432578   3.25327973  10
84   997 1003       999  0.46835827   2.04060767   7
85  1016 1026      1017  1.30848046   6.75761497  11
86  1032 1049      1042  2.91231477  18.21371123  18
87  1050 1056      1052  2.30137099   9.12951435   7
88  1057 1064      1058 12.42267898  28.12266276   8
89  1068 1075      1071  2.95992964  18.87027477   8
90  1078 1082      1079  4.29699803  13.91765462   5
91  1085 1091      1086  3.87422496  20.28939309   7
92  1091 1104      1092  8.15066732  45.56772096  14
93  1135 1164      1158  5.60090097 106.39616778  30
94  1178 1185      1179 15.97823066  49.87849800   8
95  1185 1204      1187 10.55274380  68.86113799  20
96  1206 1213      1207  5.78633381  14.54188185   8
97  1231 1233      1232  1.05613806   2.16265478   3
98  1243 1279      1257 35.46223891 223.36833364  37
99  1285 1287      1286  1.65974053   3.86178792   3
100 1292 1299      1293  2.91710821  13.16789963   8
101 1302 1305      1304  1.13047530   3.62194302   4
102 1317 1324      1321  1.85389658  10.10111508   8
103 1327 1329      1328  1.12710693   2.59046312   3
104 1333 1337      1335  1.83273633   6.14163592   5
105 1337 1344      1339  5.50857320  17.57774654   8
106 1346 1350      1347  3.51743732   9.72965345   5
107 1359 1365      1360  4.22436224  16.00094553   7
108 1368 1374      1370  2.86008783  14.67285945   7
109 1391 1395      1393  2.09391422   7.30686584   5
110 1396 1400      1397  2.74504519   9.01132791   5
111 1402 1408      1404  3.31922196  13.81375350   7

Plot Big G events using the default function

Code

par(mar = c(3,3,1,1))
plotEvents(dat_big$Yield_filled_mm, events = events, main = NA, xlab = "Time-step", ylab = "Yield")

Time series of hydrologic events at Big G, identified using eventBaseflow().

9.2.4.2 Tidy events

Now add variables to the Big G time series data specifying events and non-events

Code

# define positions of non-events
srt <- c(1)
end <- c(events$srt[1]-1)
for (i in 2:(dim(events)[1])) {
  srt[i] <- events$end[i-1]+1
  end[i] <- events$srt[i]-1
}
nonevents <- data.frame(tibble(srt, end) %>% mutate(len = end - srt) %>% filter(len >= 0) %>% select(-len) %>% add_row(srt = events$end[dim(events)[1]]+1, end = dim(dat_big)[1]))

# create vectors of binary event/non-event and event IDs
isevent_vec <- rep(2, times = dim(dat_big)[1])
eventid_vec <- rep(NA, times = dim(dat_big)[1])
for (i in 1:dim(events)[1]) { 
  isevent_vec[c(events[i,1]:events[i,2])] <- 1 
  eventid_vec[c(events[i,1]:events[i,2])] <- i
}

# create vector of non-event IDs
noneventid_vec <- rep(NA, times = dim(dat_big)[1])
for (i in 1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }

# create vector of "agnostic events": combined hydro events and non-events
agnevents <- rbind(events %>% select(srt, end) %>% mutate(event = 1), nonevents %>% mutate(event = 0)) %>% arrange((srt))
agneventid_vec <- c()
for (i in 1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }

# add event/non-event vectors to Big G data
dat_big <- dat_big %>% 
  mutate(isevent = isevent_vec, 
         eventid = eventid_vec,
         noneventid = noneventid_vec,
         agneventid = agneventid_vec,
         big_event_yield = ifelse(isevent_vec == 1, Yield_filled_mm, NA),
         big_nonevent_yield = ifelse(isevent_vec == 2, Yield_filled_mm, NA),
         big_event_quick = big_event_yield - bf) %>%
  rename(big_yield = Yield_filled_mm, big_bf = bf, big_bfi = bfi, big_flow = flow_mean_filled_cms, big_area_sqkm = area_sqkm)
(dat_big)

# A tibble: 1,411 × 17
   site_name       basin      subbasin   WaterYear date       big_yield big_flow
   <chr>           <chr>      <chr>          <dbl> <date>         <dbl>    <dbl>
 1 West Brook NWIS West Brook West Brook      2020 2020-01-31      1.67    0.572
 2 West Brook NWIS West Brook West Brook      2020 2020-02-01      1.57    0.537
 3 West Brook NWIS West Brook West Brook      2020 2020-02-02      1.53    0.523
 4 West Brook NWIS West Brook West Brook      2020 2020-02-03      1.46    0.500
 5 West Brook NWIS West Brook West Brook      2020 2020-02-04      1.41    0.482
 6 West Brook NWIS West Brook West Brook      2020 2020-02-05      1.42    0.486
 7 West Brook NWIS West Brook West Brook      2020 2020-02-06      1.45    0.495
 8 West Brook NWIS West Brook West Brook      2020 2020-02-07      2.19    0.747
 9 West Brook NWIS West Brook West Brook      2020 2020-02-08      2.38    0.815
10 West Brook NWIS West Brook West Brook      2020 2020-02-09      1.63    0.556
# ℹ 1,401 more rows
# ℹ 10 more variables: big_area_sqkm <dbl>, big_bf <dbl>, big_bfi <dbl>,
#   isevent <dbl>, eventid <int>, noneventid <int>, agneventid <int>,
#   big_event_yield <dbl>, big_nonevent_yield <dbl>, big_event_quick <dbl>

Code

dat_big %>% select(date, big_yield, big_bf, big_event_yield, big_nonevent_yield) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Time series of hydrologic events at Big G, identified using eventBaseflow().

9.2.4.3 Check congruency

Applying Big G event/non-event periods to little g time series data inherently assumes that event/non-event periods would be similarly delineated for little g. If this assumption does not hold, then non-event little g flow would be included in event periods, and vice-versa. How well does this assumption hold?

Code

sites <- c("West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook")

dat_little2 <- dat_little %>% filter(site_name == "Jimmy Brook")
events_little <- eventBaseflow(dat_little2$Yield_filled_mm, BFI_Th = 0.75)

# define positions of non-events
srt <- c(1)
end <- c(events_little$srt[1]-1)
for (i in 2:(dim(events_little)[1])) {
  srt[i] <- events_little$end[i-1]+1
  end[i] <- events_little$srt[i]-1
}
nonevents <- data.frame(tibble(srt, end) %>% mutate(len = end - srt) %>% filter(len >= 0) %>% select(-len) %>% add_row(srt = events_little$end[dim(events_little)[1]]+1, end = dim(dat_little2)[1]))

# create vectors of binary event/non-event and event IDs
isevent_vec <- rep(2, times = dim(dat_little2)[1])
eventid_vec <- rep(NA, times = dim(dat_little2)[1])
for (i in 1:dim(events_little)[1]) { 
  isevent_vec[c(events_little[i,1]:events_little[i,2])] <- 1 
  eventid_vec[c(events_little[i,1]:events_little[i,2])] <- i
}

# create vector of non-event IDs
noneventid_vec <- rep(NA, times = dim(dat_little2)[1])
for (i in 1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }

# create vector of "agnostic events": combined hydro events and non-events
agnevents <- rbind(events_little %>% select(srt, end) %>% mutate(event = 1), nonevents %>% mutate(event = 0)) %>% arrange((srt))
agneventid_vec <- c()
for (i in 1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }

# add event/non-event vectors to Big G data
dat_little2 <- dat_little2 %>% 
  mutate(isevent = isevent_vec, 
         eventid = eventid_vec,
         noneventid = noneventid_vec,
         agneventid = agneventid_vec,
         little_event_yield = ifelse(isevent_vec == 1, Yield_filled_mm, NA),
         little_nonevent_yield = ifelse(isevent_vec == 2, Yield_filled_mm, NA),
         little_event_quick = little_event_yield - bf) %>%
  rename(little_yield = Yield_filled_mm, little_bf = bf, little_bfi = bfi)


dat_big %>% select(date, big_event_yield) %>% left_join(dat_little2 %>% select(date, little_event_yield)) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Daily yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Time series of yield for Big G (West Brook NWIS) and one little g site (Jimmy Brook) during hydrologic events as delineated for Big G and little g, respectively.

Whether or not events alighn between G and g is highly variable. In some cases, g events begin/end prior to G events, and in other cases g events begin/end later G events. In some cases g events are shorter than G events, and in other cases they are longer. In many cases, events are perfectly matched. Importantly, peaks in yield are almost always synchronous.

Ultimately, does this matter given that we are simply using this as a method to break up our data? Furthermore, the framing of the ~entire project is that Big G is the reference by which to compare all little g’s. In this sense, applying event/non-event periods derived from G to g matches this persepctive.

9.2.5 Join events to Little g

Code

# fudge factor to deal with 0 flow, when logged
fudge <- 0.01

# join big g events to little g and summarize by event (cumulative, mean, and quantiles)
dat_wb2 <- dat_little %>% 
  filter(date >= min(dat_big$date) & date <= max(dat_big$date)) %>%
  left_join(dat_big %>% select(-site_name)) %>% 
  group_by(site_name, basin, subbasin, agneventid) %>% 
  summarise(eventlen = n(),
            mindate = min(date),
            isevent = unique(isevent), 
            # cumulative
            yield_little_cumul = sum(Yield_filled_mm+fudge),
            yield_big_cumul = sum(big_yield+fudge),
            yield_little_cumul_log = log(yield_little_cumul),
            yield_big_cumul_log = log(yield_big_cumul),
            # mean
            yield_little_mean = mean(Yield_filled_mm+fudge),
            yield_big_mean = mean(big_yield+fudge),
            yield_little_mean_log = mean(log(Yield_filled_mm+fudge)),
            yield_big_mean_log = mean(log(big_yield+fudge)),
            # quantiles
            yield_little_q10_log = quantile(log(Yield_filled_mm+fudge), probs = 0.10),
            yield_little_q50_log = quantile(log(Yield_filled_mm+fudge), probs = 0.50),
            yield_little_q90_log = quantile(log(Yield_filled_mm+fudge), probs = 0.90),
            yield_big_q10_log = quantile(log(big_yield+fudge), probs = 0.10),
            yield_big_q50_log = quantile(log(big_yield+fudge), probs = 0.50),
            yield_big_q90_log = quantile(log(big_yield+fudge), probs = 0.90)) %>%
  ungroup() %>%
  mutate(site_name = factor(site_name, levels = c("West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook")),
         site_name_cd = as.numeric(site_name)
         # z_yield_big_cumul_log = as.numeric(scale(yield_big_cum_log, center = TRUE, scale = TRUE)),
         # z_yield_big_mean_log = as.numeric(scale(yield_big_mean_log, center = TRUE, scale = TRUE))
         )
dat_wb2

# A tibble: 708 × 22
   site_name   basin      subbasin   agneventid eventlen mindate    isevent
   <fct>       <chr>      <chr>           <int>    <int> <date>       <dbl>
 1 Avery Brook West Brook West Brook          5        6 2020-02-20       2
 2 Avery Brook West Brook West Brook          6        6 2020-02-26       1
 3 Avery Brook West Brook West Brook          7        4 2020-03-03       1
 4 Avery Brook West Brook West Brook          8        5 2020-03-07       2
 5 Avery Brook West Brook West Brook          9        4 2020-03-12       1
 6 Avery Brook West Brook West Brook         10        2 2020-03-16       2
 7 Avery Brook West Brook West Brook         11        4 2020-03-18       1
 8 Avery Brook West Brook West Brook         12        4 2020-03-22       2
 9 Avery Brook West Brook West Brook         13       11 2020-03-26       1
10 Avery Brook West Brook West Brook         14        3 2020-04-06       2
# ℹ 698 more rows
# ℹ 15 more variables: yield_little_cumul <dbl>, yield_big_cumul <dbl>,
#   yield_little_cumul_log <dbl>, yield_big_cumul_log <dbl>,
#   yield_little_mean <dbl>, yield_big_mean <dbl>, yield_little_mean_log <dbl>,
#   yield_big_mean_log <dbl>, yield_little_q10_log <dbl>,
#   yield_little_q50_log <dbl>, yield_little_q90_log <dbl>,
#   yield_big_q10_log <dbl>, yield_big_q50_log <dbl>, …

Code

# write to file
write_csv(dat_wb2, "C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Event Delineation/Basin files/EcoDrought_Data_EventNonEvent_WestBrookonly.csv")

View pairs plot of summary statistics, for Big G only (limit to one little g site to avoid pseudo replication):

Code

myplot <- ggpairs(dat_wb2 %>% filter(site_name == "Avery Brook") %>% select(mindate, yield_big_cumul_log, yield_big_mean_log, yield_big_q10_log, yield_big_q50_log, yield_big_q90_log))
rctib <- tibble(r = c(3,4,4,5,5,5,6,6,6,6),
                c = c(2,2,3,2,3,4,2,3,4,5))
for (i in 1:nrow(rctib)) {
  myplot[rctib$r[i], rctib$c[i]] <- myplot[rctib$r[i], rctib$c[i]] + geom_abline(intercept = 0, slope = 1, color = "red")
}
myplot

9.2.6 Plot gG relationships

Code

p1 <- dat_wb2 %>% 
  ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p2 <- dat_wb2 %>% 
  ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p3 <- dat_wb2 %>% 
  ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p4 <- dat_wb2 %>% 
  ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p5 <- dat_wb2 %>% 
  ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

ggarrange(p1, p2, p3, p4, p5, nrow = 2, ncol = 3)

Relationship between yield at Big G and yield at little g, summarized during event/non-event periods with five different metrics.

View relationship between Big G and little g, color by event/non-event, facet by site. For most sites (except may Obear Brook), G-g relationships are identical between events and non-event.

Code

myplotlist <- list()
# cumulative 
myplotlist[[1]] <- dat_wb2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# mean
myplotlist[[2]] <- dat_wb2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# 10% quantile
myplotlist[[3]] <- dat_wb2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# 50% quantile
myplotlist[[4]] <- dat_wb2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# 90% quantile
myplotlist[[5]] <- dat_wb2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)

Means, as above, but color by year.

Code

dat_wb2 %>% 
  mutate(year = as.factor(year(mindate))) %>% 
  ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = year, color = year)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)

9.2.6.1 To log or not to log?

Does using log-transformed data fundamentally change nature of the relationship? What does the g~G relationship look like if we use data on the original scale? Means, for example:

Code

dat_wb2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_mean, y = yield_little_mean, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F)

9.3 Spread Creek

9.3.1 Trim to focal sites

Code

dat_sp <- dat %>% filter(basin == "Snake River", site_name != "Pacific Creek at Moran NWIS") %>%
  mutate(site_name = recode(site_name, "Leidy Creek Mouth NWIS" = "Leidy Creek Mouth", "SF Spread Creek Lower NWIS" = "SF Spread Creek Lower")) %>% 
  group_by(site_name, date) %>% summarise(Yield_filled_mm = mean(Yield_filled_mm), flow_mean_filled_cms = mean(flow_mean_filled_cms), area_sqkm = mean(area_sqkm), WaterYear = unique(WaterYear)) %>% ungroup()
unique(dat_sp$site_name)

 [1] "Grizzly Creek"         "Grouse Creek"          "Leidy Creek Mouth"    
 [4] "Leidy Creek Upper"     "NF Spread Creek Lower" "NF Spread Creek Upper"
 [7] "Rock Creek"            "SF Spread Creek Lower" "SF Spread Creek Upper"
[10] "Spread Creek Dam"

View daily yield at Big G

Code

temp <- dat_sp %>% filter(site_name == "Spread Creek Dam")
temp <- fill_missing_dates(temp, dates = date)
temp %>% select(date, Yield_filled_mm) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Daily yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Total daily streamflow in yield (mm), over the period of record

View daily yield at all sites

Code

dat_sp %>% filter(site_name != "Spread Creek Dam") %>% select(date, site_name, Yield_filled_mm) %>% spread(key = site_name, value = Yield_filled_mm) %>% left_join(dat_sp %>% filter(site_name == "Spread Creek Dam") %>% select(date, Yield_filled_mm) %>% rename(Spread_Creek_Dam = Yield_filled_mm)) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Yield (mm)") %>% dyOptions(colors = c(hcl.colors(9, "Zissou 1"), "black"))

9.3.2 Set parameters

Set baseflow separation (for the Lyne-Hollick one-parameter digital recursive filter) and event delineation paramters (as in Wasko and Guo, 2022)

alp: alpha filter parameter, higher values “lower” the estimated baseflow (thus making it difficult to delineate events)
numpass: number of passes. Ladson et al. (2013) recommend 3 passes for daily data (default in baseflowB() function)
thresh: baseflow index threshold for event delineation, higher threshold values make it “easier” to delineate events

Code

alp <- 0.925
numpass <- 3
thresh <- 0.85

9.3.3 Baseflow separation

Code

yrs <- unlist(unique(dat_sp %>% filter(site_name == "Spread Creek Dam", !is.na(Yield_filled_mm)) %>% select(WaterYear)))
datlist <- list()
for(i in 1:length(yrs)) {
  datlist[[i]] <- dat_sp %>% 
    filter(site_name == "Spread Creek Dam", WaterYear == yrs[i], !is.na(Yield_filled_mm)) %>%
    mutate(bf = baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bf, 
           bfi = baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bfi)
}
dat_sp_bf <- do.call(rbind, datlist)

Code

fill_missing_dates(dat_sp_bf, dates = date) %>% select(date, Yield_filled_mm, bf, bfi) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Total daily streamflow in yield (mm), baseflow in yield, and fractional baseflow index over the period of record for site Spread Creek Dam (Big G)

9.3.4 Event identification

9.3.4.1 Identify events

Separate Big G and Little G data

Code

dat_big <- dat_sp_bf %>% filter(site_name == "Spread Creek Dam")
dat_little <- dat_sp %>% filter(site_name != "Spread Creek Dam")

Identify events at Big G and tidy…loop over years, i.e., chunks of data

Code

yrs <- unlist(unique(dat_sp %>% filter(site_name == "Spread Creek Dam", !is.na(Yield_filled_mm)) %>% select(WaterYear)))
biglist <- list()
for (k in 1:length(yrs)) {
  ddd <- dat_sp_bf %>% filter(WaterYear == yrs[k])
  events <- eventBaseflow(ddd$Yield_filled_mm, BFI_Th = thresh, bfi = ddd$bfi)
  events <- events %>% mutate(len = end - srt + 1)
  
  # define positions of non-events
  srt <- c(1)
  end <- c(events$srt[1]-1)
  for (i in 2:(dim(events)[1])) {
    srt[i] <- events$end[i-1]+1
    end[i] <- events$srt[i]-1
    }
  nonevents <- tibble(srt, end) %>% mutate(len = end - srt) %>% filter(len >= 0) %>% select(-len)
  if(events$end[dim(events)[1]] != dim(ddd)[1]) { nonevents <- nonevents %>% add_row(srt = events$end[dim(events)[1]]+1, end = dim(ddd)[1]) }
  nonevents <- data.frame(nonevents)
  
  # create vectors of binary event/non-event and event IDs
  isevent_vec <- rep(2, times = dim(ddd)[1])
  eventid_vec <- rep(NA, times = dim(ddd)[1])
  for (i in 1:dim(events)[1]) { 
    isevent_vec[c(events[i,1]:events[i,2])] <- 1 
    eventid_vec[c(events[i,1]:events[i,2])] <- i
  }

  # create vector of non-event IDs
  noneventid_vec <- rep(NA, times = dim(ddd)[1])
  for (i in 1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }

  # create vector of "agnostic events": combined hydro events and non-events
  agnevents <- rbind(events %>% select(srt, end) %>% mutate(event = 1), nonevents %>% mutate(event = 0)) %>% arrange((srt))
  agneventid_vec <- c()
  for (i in 1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }

  # add event/non-event vectors to Big G data
  biglist[[k]] <- ddd %>% 
    mutate(isevent = isevent_vec, 
           eventid = eventid_vec,
           noneventid = noneventid_vec,
           agneventid = agneventid_vec,
           big_event_yield = ifelse(isevent_vec == 1, Yield_filled_mm, NA),
           big_nonevent_yield = ifelse(isevent_vec == 2, Yield_filled_mm, NA),
           big_event_quick = big_event_yield - bf) %>% 
    mutate(agneventid = paste(yrs[k], "_", agneventid, sep = "")) %>%
    rename(big_yield = Yield_filled_mm, big_bf = bf, big_bfi = bfi, big_flow = flow_mean_filled_cms, big_area_sqkm = area_sqkm)
}
dat_big <- do.call(rbind, biglist)
dat_big

# A tibble: 1,059 × 15
   site_name        date       big_yield big_flow big_area_sqkm WaterYear big_bf
   <chr>            <date>         <dbl>    <dbl>         <dbl>     <dbl>  <dbl>
 1 Spread Creek Dam 2012-06-24      1.77     5.16          253.      2013  0.884
 2 Spread Creek Dam 2012-06-25      1.72     5.03          253.      2013  0.914
 3 Spread Creek Dam 2012-06-26      1.69     4.93          253.      2013  0.941
 4 Spread Creek Dam 2012-06-27      1.67     4.89          253.      2013  0.966
 5 Spread Creek Dam 2012-06-28      1.63     4.75          253.      2013  0.988
 6 Spread Creek Dam 2012-06-29      1.55     4.52          253.      2013  1.01 
 7 Spread Creek Dam 2012-06-30      1.51     4.41          253.      2013  1.02 
 8 Spread Creek Dam 2012-07-01      1.44     4.21          253.      2013  1.04 
 9 Spread Creek Dam 2012-07-02      1.40     4.08          253.      2013  1.05 
10 Spread Creek Dam 2012-07-03      1.42     4.14          253.      2013  1.06 
# ℹ 1,049 more rows
# ℹ 8 more variables: big_bfi <dbl>, isevent <dbl>, eventid <int>,
#   noneventid <int>, agneventid <chr>, big_event_yield <dbl>,
#   big_nonevent_yield <dbl>, big_event_quick <dbl>

Code

fill_missing_dates(dat_big, dates = date) %>% select(date, big_yield, big_bf, big_event_yield, big_nonevent_yield) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)

Time series of hydrologic events at Big G, identified using eventBaseflow().

9.3.5 Join events to Little g

Code

# fudge factor to deal with 0 flow, when logged
fudge <- 0.01

# join big g events to little g and summarize by event (cumulative, mean, and quantiles)
dat_sp2 <- dat_little %>% 
  filter(date >= min(dat_big$date) & date <= max(dat_big$date)) %>%
  left_join(dat_big %>% select(-site_name)) %>% 
  group_by(site_name, agneventid) %>% 
  summarise(eventlen = n(),
            mindate = min(date),
            isevent = unique(isevent), 
            n_little = sum(!is.na(Yield_filled_mm)),
            n_big = sum(!is.na(big_yield)),
            # cumulative
            yield_little_cumul = sum(Yield_filled_mm+fudge, na.rm = TRUE),
            yield_big_cumul = sum(big_yield+fudge, na.rm = TRUE),
            yield_little_cumul_log = log(yield_little_cumul),
            yield_big_cumul_log = log(yield_big_cumul),
            # mean
            yield_little_mean = mean(Yield_filled_mm+fudge),
            yield_big_mean = mean(big_yield+fudge),
            yield_little_mean_log = mean(log(Yield_filled_mm+fudge), na.rm = TRUE),
            yield_big_mean_log = mean(log(big_yield+fudge), na.rm = TRUE),
            # quantiles
            yield_little_q10_log = quantile(log(Yield_filled_mm+fudge), probs = 0.10, na.rm = TRUE),
            yield_little_q50_log = quantile(log(Yield_filled_mm+fudge), probs = 0.50, na.rm = TRUE),
            yield_little_q90_log = quantile(log(Yield_filled_mm+fudge), probs = 0.90, na.rm = TRUE),
            yield_big_q10_log = quantile(log(big_yield+fudge), probs = 0.10, na.rm = TRUE),
            yield_big_q50_log = quantile(log(big_yield+fudge), probs = 0.50, na.rm = TRUE),
            yield_big_q90_log = quantile(log(big_yield+fudge), probs = 0.90, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(site_name = factor(site_name, levels = c("Rock Creek", "SF Spread Creek Lower", "Grouse Creek", "SF Spread Creek Upper", "Leidy Creek Mouth", "Leidy Creek Upper", "NF Spread Creek Lower", "NF Spread Creek Upper", "Grizzly Creek")),
         site_name_cd = as.numeric(site_name)
         # z_yield_big_cumul_log = as.numeric(scale(yield_big_cum_log, center = TRUE, scale = TRUE)),
         # z_yield_big_mean_log = as.numeric(scale(yield_big_mean_log, center = TRUE, scale = TRUE))
         ) %>%
  filter(!is.na(agneventid))
dat_sp2 <- dat_sp2[dat_sp2$n_little == dat_sp2$n_big,]

dat_sp2

# A tibble: 609 × 22
   site_name     agneventid eventlen mindate    isevent n_little n_big
   <fct>         <chr>         <int> <date>       <dbl>    <int> <int>
 1 Grizzly Creek 2019_1           13 2018-07-03       2       13    13
 2 Grizzly Creek 2019_2           27 2018-07-16       1       27    27
 3 Grizzly Creek 2019_3           14 2018-08-12       2       14    14
 4 Grizzly Creek 2019_4            6 2018-08-26       1        6     6
 5 Grizzly Creek 2019_5           33 2018-09-01       2       33    33
 6 Grizzly Creek 2019_6            3 2018-10-04       1        3     3
 7 Grizzly Creek 2019_7            8 2018-10-07       2        8     8
 8 Grizzly Creek 2022_1           45 2021-06-15       2       45    45
 9 Grizzly Creek 2022_10           5 2021-10-08       1        5     5
10 Grizzly Creek 2022_11           2 2021-10-13       2        2     2
# ℹ 599 more rows
# ℹ 15 more variables: yield_little_cumul <dbl>, yield_big_cumul <dbl>,
#   yield_little_cumul_log <dbl>, yield_big_cumul_log <dbl>,
#   yield_little_mean <dbl>, yield_big_mean <dbl>, yield_little_mean_log <dbl>,
#   yield_big_mean_log <dbl>, yield_little_q10_log <dbl>,
#   yield_little_q50_log <dbl>, yield_little_q90_log <dbl>,
#   yield_big_q10_log <dbl>, yield_big_q50_log <dbl>, …

Code

# write to file
write_csv(dat_sp2, "C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Event Delineation/Basin files/EcoDrought_Data_EventNonEvent_SpreadCreekonly.csv")

View pairs plot of summary statistics, for Big G only (limit to one little g site to avoid pseudo replication):

Code

myplot <- ggpairs(dat_sp2 %>% select(mindate, yield_big_cumul_log, yield_big_mean_log, yield_big_q10_log, yield_big_q50_log, yield_big_q90_log))
rctib <- tibble(r = c(3,4,4,5,5,5,6,6,6,6),
                c = c(2,2,3,2,3,4,2,3,4,5))
for (i in 1:nrow(rctib)) {
  myplot[rctib$r[i], rctib$c[i]] <- myplot[rctib$r[i], rctib$c[i]] + geom_abline(intercept = 0, slope = 1, color = "red")
}
myplot

9.3.6 Plot gG relationships

Code

p1 <- dat_sp2 %>% 
  ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p2 <- dat_sp2 %>% 
  ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p3 <- dat_sp2 %>% 
  ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p4 <- dat_sp2 %>% 
  ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

p5 <- dat_sp2 %>% 
  ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F) + theme(legend.position="none")

ggarrange(p1, p2, p3, p4, p5, nrow = 2, ncol = 3)

View relationship between Big G and little g, color by event/non-event, facet by site. For most sites (except may Obear Brook), G-g relationships are identical between events and non-event.

Code

myplotlist <- list()
# cumulative 
myplotlist[[1]] <- dat_sp2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# mean
myplotlist[[2]] <- dat_sp2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# 10% quantile
myplotlist[[3]] <- dat_sp2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# 50% quantile
myplotlist[[4]] <- dat_sp2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)
# 90% quantile
myplotlist[[5]] <- dat_sp2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = isevent, color = isevent)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)

Means, as above, but color by year.

Code

dat_sp2 %>% 
  mutate(year = as.factor(year(mindate))) %>% 
  ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = year, color = year)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm") + facet_wrap(~site_name)

9.3.6.1 To log or not to log?

Does using log-transformed data fundamentally change nature of the relationship? What does the g~G relationship look like if we use data on the original scale? Means, for example:

Code

dat_sp2 %>% 
  mutate(isevent = dplyr::recode(isevent, "1" = "Event", "2" = "Baseflow")) %>% 
  ggplot(aes(x = yield_big_mean, y = yield_little_mean, group = site_name, color = site_name)) + 
  geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  geom_smooth(method = "lm", se = F)

9.4 DEPRECATED

9.5 Sensitivity analyses

9.5.1 Missing data

Many of the EcoDrought time series data are incomplete. At some sites, discharge data is available only during the summer and/or fall periods, and at other sites, time series data are interrupted due to malfunctioning sensors and/or ice formation (“ice spikes”). So how does the length of the time series affect baseflow separation (and subsequent event identification)? Wasko and Guo (2022) use a 67 day time series of flow to demonstrate the utility of the hydroEvents packages, suggesting digital baseflow separation techniques may be valid for relatively short time series.

Here, I perform a simple sensitivity analysis to explore the effect of time series length on the results of baseflow separation. Essentially, perform baseflow separation on increasingly smaller subsets of the data. With the default parameters, the minimum number of days/observations needed is 31. This is because the default number of points reflected at start and end of data (r) is 30. Reflection allows bf/bfi to be calculated over the entire period of record as the underlying baseflow separation equations result in “issues of”Warm-up” and “cool-down” as the recursive filter is moved forward and backward over the dataset” (Ladson et al. 2013, Australian Journal of Water Resources). baseflowB() uses a default reflection period of 30, which Ladson et al. (2013) found to “provide a realistic baselfow response for the start and end of the actual flow data”.

9.5.1.1 Compare baseflow

Divergence in baseflow among datasets is a result of the reflected data of the shorter dataset not matching the actual data of the longer dataset. As a result, divergence really only occurs at the end of each time series and is generally small in magnitude.

9.5.1.2 Compare baseflow index

The story here is essentially the same as above: divergence is ~minimal and restricted to the end of each time series. However, we note that divergence in BFI appears to increase as absolute flow/baseflow decreases, because small differences in absolute space become much larger in relative space when absolute values are small.

View relationship between Big G and among-site/event-specific standard deviation in little g.